107 research outputs found

    Exact distribution for the local score of one i.i.d. random sequence

    Get PDF
    International audienceLet X1...Xn be a sequence of IID positive or negative integer valued random variables and Hn max i j n Xi Xj the local score of the sequence The exact distribution of Hn is obtained using a simple Markov chain This result is applied to the scoring of DNA and protein sequences in molecular biolog

    A simple proof of the Wirsching-Goodwin representation of integers connected to 1 in the 3x+13x+1 problem

    Full text link
    This paper gives a simple proof of the Wirsching-Goodwin representation of integers connected to 1 in the 3x+13x+1 problem (see \cite{Wirsching} and \cite{Goodwin}). This representation permits to compute all the ascending Collatz sequences (f(i)(n),i=1,b1)(f^{(i)}(n),\: i=1,b-1) with a last value f(b)(n)=1.f^{(b)}(n)=1. Other periodic sequences connected to 11 are also identified

    A statistical approach for array CGH data analysis

    Get PDF
    BACKGROUND: Microarray-CGH experiments are used to detect and map chromosomal imbalances, by hybridizing targets of genomic DNA from a test and a reference sample to sequences immobilized on a slide. These probes are genomic DNA sequences (BACs) that are mapped on the genome. The signal has a spatial coherence that can be handled by specific statistical tools. Segmentation methods seem to be a natural framework for this purpose. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose BACs share the same relative copy number on average. We model a CGH profile by a random Gaussian process whose distribution parameters are affected by abrupt changes at unknown coordinates. Two major problems arise : to determine which parameters are affected by the abrupt changes (the mean and the variance, or the mean only), and the selection of the number of segments in the profile. RESULTS: We demonstrate that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and we propose an adaptive criterion that detects previously mapped chromosomal aberrations. The performances of this method are discussed based on simulations and publicly available data sets. Then we discuss the choice of modeling for array CGH data and show that the model with a homogeneous variance is adapted to this context. CONCLUSIONS: Array CGH data analysis is an emerging field that needs appropriate statistical tools. Process segmentation and model selection provide a theoretical framework that allows precise biological interpretations. Adaptive methods for model selection give promising results concerning the estimation of the number of altered regions on the genome

    Classification and estimation in the Stochastic Blockmodel based on the empirical degrees

    Get PDF
    International audienceThe Stochastic Blockmodel [16] is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on the node class. We show under a mild assumption that classification, estimation and model selection can actually be achieved with no more than the empirical degree data. We provide an algorithm able to process very large networks and consistent estimators based on it. In particular, we prove a bound of the probability of misclassification of at least one node, including when the number of classes grows

    Accuracy of Variational Estimates for Random Graph Mixture Models

    No full text
    International audienceL'analyse des réseaux exerce depuis quelques années un attrait croissant. Les données qui sont sous la forme de mesures de relations entre items sont de plus en plus disponibles, et abandonnent la structure usuelle d'un jeu de données de type individus-variables pour une structure de type individus-individus. Ces données "relationnelles" sont très souvent présentées sous la forme d'un graphe, même si cette représentation a ses limites, notamment quand le nombre d'individus dépasse la centaine. La représentation graphique des données des réseaux est alors attractive, mais nécessite un modèle synthétique. Le modèle de graphe le plus ancien et le plus utilisé est le modèle de Erdös-Rényi, dont les propriétés moyennes ou asymptotiques sont connues. L'écriture littérale de la vraisemblance de ce modèle est très simple, mais son temps de calcul croit de façon exponentielle avec le nombre d'individu. Une utilisation des algorithmes d'estimation usuels comme E-M n'est pas envisageable. Une approche variationnelle a été utilisée comme alternative pour implémenter un algorithme d'estimation des paramètres du modèle, et cela pour des réseaux de très grande taille (Daudin & al 2008). Les propriétés statistiques des estimateurs produits par cette approche sont cependant mal connues. L'objectif est de mener une étude sur la qualité de ces estimateurs et d'en prouver la convergence

    Asymptotic behavior of the local score of independent and identically distributed random sequences

    Get PDF
    AbstractLet (Xn)n⩾1 be a sequence of real random variables. The local score is Hn=max1⩽i<j⩽n(Xi+⋯+Xj). If (Xn)n⩾1 is a “good” Markov chain under its invariant measure, the Xi are centered, we prove that Hn/n converges in distribution to B1∗ when n→+∞, where B1∗=max0⩽u⩽1|Bu| and (Bu,u⩾0) is a standard Brownian motion, B0=0. If (Xn)n⩾1 a sequence of i.i.d. random variables, E(X1)=δ/n and Var(X1)=σ2>0, we prove the convergence of Hn/n to σξδ/σ where ξγ=max0⩽u⩽1{(B(u)+γu)−min0⩽s⩽u(B(s)+γs)}. We approximate the probability distribution function of ξγ and we determine the asymptotic behavior of P(ξγ⩾a),a→+∞

    Extracting biological information from DNA arrays: an unexpected link between arginine and methionine metabolism in Bacillus subtilis

    Get PDF
    BACKGROUND: In global gene expression profiling experiments, variation in the expression of genes of interest can often be hidden by general noise. To determine how biologically significant variation can be distinguished under such conditions we have analyzed the differences in gene expression when Bacillus subtilis is grown either on methionine or on methylthioribose as sulfur source. RESULTS: An unexpected link between arginine metabolism and sulfur metabolism was discovered, enabling us to identify a high-affinity arginine transport system encoded by the yqiXYZ genes. In addition, we tentatively identified a methionine/methionine sulfoxide transport system which is encoded by the operon ytmIJKLMhisP and is presumably used in the degradation of methionine sulfoxide to methane sulfonate for sulfur recycling. Experimental parameters resulting in systematic biases in gene expression were also uncovered. In particular, we found that the late competence operons comE, comF and comG were associated with subtle variations in growth conditions. CONCLUSIONS: Using variance analysis it is possible to distinguish between systematic biases and relevant gene-expression variation in transcriptome experiments. Co-variation of metabolic gene expression pathways was thus uncovered linking nitrogen and sulfur metabolism in B. subtilis
    corecore